Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin-ext: Change PluginsKeyValueStorage to not continuously access the disk #12236

Merged
merged 17 commits into from
May 25, 2023

Conversation

marcdumais-work
Copy link
Contributor

@marcdumais-work marcdumais-work commented Feb 27, 2023

What it does

Fixes #8384

Added the use of a semaphore to lock access to the global plugin storage file, to avoid it being accessed in a disorderly fashion, causing potential corruption.

As per the issue discussion, it's possible that there is also a potential race condition caused by node's write function, not flushing its buffers to disk, but in this specific case, it was obvious that parallel calls to set() and get() were the main cause of the problems.

How to test

One way is to add ENTER/EXIT debug traces [1] to packages/plugin-ext/src/main/node/plugins-key-value-storage.ts, in set() and get() and then rebuild , run the browser tests ('yarn browser test`), and confirm that the accesses are now matched. i.e. "ENTER get()" should always be followed by "EXIT get()", and "ENTER set()" always followed by "EXIT set().

[1]

diff --git a/packages/plugin-ext/src/main/node/plugins-key-value-storage.ts b/packages/plugin-ext/src/main/node/plugins-key-value-storage.ts
index 7200ce2e049..c0a7e91a325 100644
--- a/packages/plugin-ext/src/main/node/plugins-key-value-storage.ts
+++ b/packages/plugin-ext/src/main/node/plugins-key-value-storage.ts
@@ -57,6 +57,7 @@ export class PluginsKeyValueStorage {
     }
 
     async set(key: string, value: KeysToAnyValues, kind: PluginStorageKind): Promise<boolean> {
+        console.log('ENTER set()');
         try {
             await lock.acquire();
             const dataPath = await this.getDataPath(kind);
@@ -77,10 +78,12 @@ export class PluginsKeyValueStorage {
             return true;
         } finally {
             lock.release();
+            console.log('EXIT set()');
         };
     }
 
     async get(key: string, kind: PluginStorageKind): Promise<KeysToAnyValues> {
+        console.log('ENTER get()');
         try {
             await lock.acquire();
             const dataPath = await this.getDataPath(kind);
@@ -91,10 +94,12 @@ export class PluginsKeyValueStorage {
             return data[key];
         } finally {
             lock.release();
+            console.log('EXIT get()');
         };
     }
 
     async getAll(kind: PluginStorageKind): Promise<KeysToKeysToAnyValue> {
+        console.log('ENTER getAll()');
         try {
             await lock.acquire();
             const dataPath = await this.getDataPath(kind);
@@ -104,6 +109,7 @@ export class PluginsKeyValueStorage {
             return this.readFromFile(dataPath);
         } finally {
             lock.release();
+            console.log('EXIT getAll()');
         }
     }

Review checklist

Reminder for reviewers

@paul-marechal
Copy link
Member

paul-marechal commented Feb 28, 2023

I pushed changes to use another semaphore library that is already used by Theia: async-mutex.

I also cleaned up the class to get rid of the asynchronous postConstruct as this is a mistake when using Inversify 4.

Lastly, is there a way to write browser tests for this fix? From what I understand this class is quite nested in the plugin system infrastructure, so if it's too difficult writing integration tests please write unit tests then.

Here's an example for a unit test I think could be good:

const tasks = [];
for (let i = 0; i < 1000; i++) {
    tasks.push(storage.set('test_value', i));
}
await Promise.allSettled(tasks);
assert.deepStrictEqual(
    await storage.get('test_value'),
    999,
    'the last value being set should have been 999'
);

edit: Updated the snippet to something that should work

Previously the result of this snippet would have been non deterministic?

@paul-marechal
Copy link
Member

paul-marechal commented Feb 28, 2023

The last commit I pushed adds logic to use unique mutexes based on the file path being written to. It seems to be something that can happen based on the kind parameter passed to several methods. This is better than locking access to unrelated files by using the same global lock.

@colin-grant-work
Copy link
Contributor

One thing to consider, at least re: writing: there's already something doing a somewhat similar job for our preference files on the frontend in packages/preferences/src/browser/preference-transaction-manager.ts. The only thing missing for it to work on the backend is a backend equivalent to the MonacoJSONCEditor in packages/preferences/src/browser/monaco-jsonc-editor.ts

@paul-marechal
Copy link
Member

@colin-grant-work is it me or the component you linked seems hyper-specific to preferences?

The FileSystemLocking service introduced here is mostly a generic mutex bag and doesn't make any assumption about what you might do while having exclusive access (use the node APIs or don't)

Copy link
Contributor

@colin-grant-work colin-grant-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the lifecycle of this system: it seems to me we only need to read from the file once at startup, and after that, all changes can be kept in memory and written to the underlying storage only at the end of the session (or periodically to avoid breakdown). That should mean that we never have a problem with concurrent reading and writing: before we write anything, we make sure we've read, and thereafter we never need to read again.

@colin-grant-work
Copy link
Contributor

@colin-grant-work is it me or the component you linked seems hyper-specific to preferences?

Yes, but it would do the job that we need to do for the globalState mechanism, since that also just manages a JSON file.

@paul-marechal
Copy link
Member

it seems to me we only need to read from the file once at startup, and after that, all changes can be kept in memory and written to the underlying storage only at the end of the session (or periodically to avoid breakdown).

I've thought about it too, but it would require heavily modifying this component. The way it currently operates might not be the most efficient you are right, and the current fix was the lowest effort required to patch the issue.

I'll look into storing key/values in memory and periodically synchronizing on disk.

@colin-grant-work
Copy link
Contributor

I also think that @tsmaeder's point about this being an OS-level problem merits serious consideration, because two Theia instances running on the same machine will both want access to the same files, and as long as we're not using the single-instance lock to ensure that we only have one backend, we've got no guarantee that they won't be operating on the file at the same item. It also increases the urgency of keeping the file up to date with regular writes, because the second instance would want to have an up-to-date version of the data when it starts up. But that means that all changes need to be written promptly to disk, and that means that the probability of OS-level concurrent access is non-negligible.

@paul-marechal
Copy link
Member

Looking at "system file locking" was inconclusive, and only the proper-lockfile package looks usable which would prevent two Theia instances from conflicting. Other processes can still mess up with our files, but at this point it's out of scope.

It then depends on what kind of behaviour we want: The current approach assumes any update is directly written on disk and visible by subsequent accesses. I'm not sure of the benefits from this approach.

@paul-marechal
Copy link
Member

Well, proper-lockfile doesn't seem to work when the file doesn't yet exist.

@paul-marechal
Copy link
Member

@colin-grant-work I used the strategy you mentioned where we use an in-memory cache first for the values, and only sync with the disk every so often. The sync delay is long enough with a bit of randomness as to avoid collisions with other Theia instances. It might not be fool-proof but it's arguably better than the previous implementation. I think we'll have a long and happy life before we'll get to see problems with this new implementation, but I might be wrong.

@marcdumais-work
Copy link
Contributor Author

Well, proper-lockfile doesn't seem to work when the file doesn't yet exist.

Have you tried to disable the realPath option? Else creating an empty file first, if/when it doesn't yet exists, looks like a viable option (the file itself is not used as the lock, so this should not introduce race conditions).

@marcdumais-work
Copy link
Contributor Author

The scope of the PR has changed quite a bit, so probably the title/description should be updated accordingly.

@paul-marechal paul-marechal changed the title [plugin-ext] Add file locking to avoid race condition writing to global plugin storage file plugin-ext: Change PluginsKeyValueStorage to not continuously access the disk Mar 8, 2023
@paul-marechal
Copy link
Member

paul-marechal commented Apr 18, 2023

To @msujew, @JonasHelming, can someone from either TypeFox or EclipseSource review this patch?

Summary of the changes so far: Don't access the disk for each read/write access and instead synchronize on disk every so often. Prevention against multiple Theia instances trying to use the same file is postponed until someone else is free to implement it as it should be easy to add later if we need it (just update the lockPath implementation).

@msujew msujew self-requested a review April 18, 2023 20:52
@vince-fugnitto vince-fugnitto modified the milestone: 1.37.0 Apr 27, 2023
marcdumais-work and others added 8 commits May 1, 2023 12:44
…al plugin storage file

Fixes #8384

Added the use of a semaphore to lock access to the global plugin storage file, to avoid it
being accessed in a disorderly fashion, causing potential corruption.

As per the issue discussion, it's possible that there is also a potential race condition
caused by node's write function, not flushing its buffers to disk, but in this specific case,
it was obvious that parallel calls to set() and get() were the main cause of the problems.

Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>
Inversify 4 doesn't really support async postConstruct methods.
Also put a guard around the folder initialization before writing to the
`global-state.json` file.
Turns out the `PluginsKeyValueStorage` implementation handles more than
just the one `global-state.json` file, so we need to make sure to guard
against concurrent access to these files too.
Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>
The native Node file system APIs are powerful but low level: Nothing
prevents bad code from concurrently writing to the same file.

`FileSystemLocking` is meant to be a central component for any method
handling files and wanting to prevent concurrent access by other parts
of the framework.
Instead of reading/writing on each get/set, use an in-memory cache and
synchronize the data on disk every so often.
Copy link
Member

@vince-fugnitto vince-fugnitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

  • confirmed that the code looks good
  • confirmed that the unit-tests pass successfully
  • confirmed that the global-state.json error no longer occurs

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me as well 👍

@paul-marechal paul-marechal merged commit 2138dac into master May 25, 2023
@paul-marechal paul-marechal deleted the global-storage-lock branch May 25, 2023 00:18
@github-actions github-actions bot added this to the 1.38.0 milestone May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python] plugin-storage/global-state.json: Unexpected end of JSON input
5 participants